350 research outputs found

    A performance focused, development friendly and model aided parallelization strategy for scientific applications

    Get PDF
    The amelioration of high performance computing platforms has provided unprecedented computing power with the evolution of multi-core CPUs, massively parallel architectures such as General Purpose Graphics Processing Units (GPGPUs) and Many Integrated Core (MIC) architectures such as Intel\u27s Xeon phi coprocessor. However, it is a great challenge to leverage capabilities of such advanced supercomputing hardware, as it requires efficient and effective parallelization of scientific applications. This task is difficult mainly due to complexity of scientific algorithms coupled with the variety of available hardware and disparate programming models. To address the aforementioned challenges, this thesis presents a parallelization strategy to accelerate scientific applications that maximizes the opportunities of achieving speedup while minimizing the development efforts. Parallelization is a three step process (1) choose a compatible combination of architecture and parallel programming language, (2) translate base code/algorithm to a parallel language and (3) optimize and tune the application. In this research, a quantitative comparison of run time for various implementations of k-means algorithm, is used to establish that native languages (OpenMP, MPI, CUDA) perform better on respective architectures as opposed to vendor-neutral languages such as OpenCL. A qualitative model is used to select an optimal architecture for a given application by aligning the capabilities of accelerators with characteristics of the application. Once the optimal architecture is chosen, the corresponding native language is employed. This approach provides the best performance with reasonable accuracy (78%) of predicting a fitting combination, while eliminating the need for exploring different architectures individually. It reduces the required development efforts considerably as the application need not be re-written in multiple languages. The focus can be solely on optimization and tuning to achieve the best performance on available architectures with minimized investment in terms of cost and efforts. To verify the prediction accuracy of the qualitative model, the OpenDwarfs benchmark suite, which implements the Berkeley\u27s dwarfs in OpenCL, is used. A dwarf is an algorithmic method that captures a pattern of computation and communication. For the purpose of this research, the focus is on 9 application from various algorithmic domains that cover the seven dwarfs of symbolic computation, which were identified by Phillip Colella, as omnipresent in scientific and engineering applications. To validate the parallelization strategy collectively, a case study is undertaken. This case study involves parallelization of the Lower Upper Decomposition for the Gaussian Elimination algorithm from the linear algebra domain, using conventional trial and error methods as well as the proposed \u27Architecture First, Language Later\u27\u27 strategy. The development efforts incurred are contrasted for both methods. The aforesaid proposed strategy is observed to reduce the development efforts by an average of 50%

    Analysis of a Gibbs sampler method for model based clustering of gene expression data

    Full text link
    Over the last decade, a large variety of clustering algorithms have been developed to detect coregulatory relationships among genes from microarray gene expression data. Model based clustering approaches have emerged as statistically well grounded methods, but the properties of these algorithms when applied to large-scale data sets are not always well understood. An in-depth analysis can reveal important insights about the performance of the algorithm, the expected quality of the output clusters, and the possibilities for extracting more relevant information out of a particular data set. We have extended an existing algorithm for model based clustering of genes to simultaneously cluster genes and conditions, and used three large compendia of gene expression data for S. cerevisiae to analyze its properties. The algorithm uses a Bayesian approach and a Gibbs sampling procedure to iteratively update the cluster assignment of each gene and condition. For large-scale data sets, the posterior distribution is strongly peaked on a limited number of equiprobable clusterings. A GO annotation analysis shows that these local maxima are all biologically equally significant, and that simultaneously clustering genes and conditions performs better than only clustering genes and assuming independent conditions. A collection of distinct equivalent clusterings can be summarized as a weighted graph on the set of genes, from which we extract fuzzy, overlapping clusters using a graph spectral method. The cores of these fuzzy clusters contain tight sets of strongly coexpressed genes, while the overlaps exhibit relations between genes showing only partial coexpression.Comment: 8 pages, 7 figure

    Concerted bioinformatic analysis of the genome-scale blood transcription factor compendium reveals new control mechanisms.

    Get PDF
    Transcription factors play a key role in the development of a disease. ChIP-sequencing has become a preferred technique to investigate genome-wide binding patterns of transcription factors in vivo. Although this technology has led to many important discoveries, the rapidly increasing number of publicly available ChIP-sequencing datasets still remains a largely unexplored resource. Using a compendium of 144 publicly available murine ChIP-sequencing datasets in blood, we show that systematic bioinformatic analysis can unravel diverse aspects of transcription regulation; from genome-wide binding preferences, finding regulatory partners and assembling regulatory complexes, to identifying novel functions of transcription factors and investigating transcription dynamics during development.This is the final published version as published by the Royal Society of Chemistry in Molecular Biosystems here: http://pubs.rsc.org/en/Content/ArticleLanding/2014/MB/C4MB00354C#divAbstract

    Comprehensive analysis of epigenetic signatures of human transcription control

    Get PDF
    Advances in sequencing technologies have enabled exploration of epigenetic and transcriptional profiles at a genome-wide level. The epigenetic and transcriptional landscapes are now available in hundreds of mammalian cell and tissue contexts. Many studies have performed multi-omics analyses using these datasets to enhance our understanding of relationships between epigenetic modifications and transcription regulation. Nevertheless, most studies so far have focused on the promoters/enhancers and transcription start sites, and other features of transcription control including exons, introns and transcription termination remain underexplored. We investigated the interplay between epigenetic modifications and diverse transcription features using the data generated by the Roadmap Epigenomics project. A comprehensive analysis of histone modifications, DNA methylation, and RNA-seq data of thirty-three human cell lines and tissue types allowed us to confirm the generality of previously described relationships, as well as to generate new hypotheses about the interplay between epigenetic modifications and transcription features. Importantly, our analysis included previously under-explored features of transcription control, namely, transcription termination sites, exon–intron boundaries, and the exon inclusion ratio. We have made the analyses freely available to the scientific community at joshiapps.cbu.uib.no/perepigenomics_app/ for easy exploration, validation and hypothesis generation.publishedVersio

    From genotype to phenotype: Through chromatin

    Get PDF
    Advances in sequencing technologies have enabled the exploration of the genetic basis for several clinical disorders by allowing identification of causal mutations in rare genetic diseases. Sequencing technology has also facilitated genome-wide association studies to gather single nucleotide polymorphisms in common diseases including cancer and diabetes. Sequencing has therefore become common in the clinic for both prognostics and diagnostics. The success in follow-up steps, i.e., mapping mutations to causal genes and therapeutic targets to further the development of novel therapies, has nevertheless been very limited. This is because most mutations associated with diseases lie in inter-genic regions including the so-called regulatory genome. Additionally, no genetic causes are apparent for many diseases including neurodegenerative disorders. A complementary approach is therefore gaining interest, namely to focus on epigenetic control of the disease to generate more complete functional genomic maps. To this end, several recent studies have generated large-scale epigenetic datasets in a disease context to form a link between genotype and phenotype. We focus DNA methylation and important histone marks, where recent advances have been made thanks to technology improvements, cost effectiveness, and large meta-scale epigenome consortia efforts. We summarize recent studies unravelling the mechanistic understanding of epigenetic processes in disease development and progression. Moreover, we show how methodology advancements enable causal relationships to be established, and we pinpoint the most important issues to be addressed by future research.publishedVersio

    Multi-omics and machine learning for the prevention and management of female reproductive health

    Get PDF
    Females typically carry most of the burden of reproduction in mammals. In humans, this burden is exacerbated further, as the evolutionary advantage of a large and complex human brain came at a great cost of women’s reproductive health. Pregnancy thus became a highly demanding phase in a woman’s life cycle both physically and emotionally and therefore needs monitoring to assure an optimal outcome. Moreover, an increasing societal trend towards reproductive complications partly due to the increasing maternal age and global obesity pandemic demands closer monitoring of female reproductive health. This review first provides an overview of female reproductive biology and further explores utilization of large-scale data analysis and -omics techniques (genomics, transcriptomics, proteomics, and metabolomics) towards diagnosis, prognosis, and management of female reproductive disorders. In addition, we explore machine learning approaches for predictive models towards prevention and management. Furthermore, mobile apps and wearable devices provide a promise of continuous monitoring of health. These complementary technologies can be combined towards monitoring female (fertility-related) health and detection of any early complications to provide intervention solutions. In summary, technological advances (e.g., omics and wearables) have shown a promise towards diagnosis, prognosis, and management of female reproductive disorders. Systematic integration of these technologies is needed urgently in female reproductive healthcare to be further implemented in the national healthcare systems for societal benefit.publishedVersio

    A NEW REVERSED-PHASE HIGH-PERFORMANCE LIQUID CHROMATOGRAPHY METHOD FOR THE SIMULTANEOUS ESTIMATION OF SERRATIOPEPTIDASE AND DICLOFENAC SODIUM IN BULK AND TABLET DOSAGE FORM

    Get PDF
    Objective: The objective is to study the development of a simple, rapid, specific, precise, and accurate reversed-phase high-performance liquid chromatography (RP-HPLC) method for the simultaneous estimation of serratiopeptidase (SER) and diclofenac (DC) sodium in bulk and tablet formulation.Methods: RP-HPLC method was developed for the simultaneous estimation of SER and DC sodium in tablet formulation. The separation was achieved by Kromasil C18 column (250 mm × 4.6 mm, 5 μm particle size) with phosphate buffer pH-7 and o-phosphoric acid:methanol:acetonitrile (5:4:1% v/v/v). Flow rate was maintained at 1 mL/min and UV detection was carried at 270 nm.Result: For RP-HPLC method, the retention time for SER and DC sodium was found to be 3.3833 min and 8.1667 min, respectively. The method was validated for accuracy, precision, and specificity. Linearity for SER and DC sodium was in the range of 5–50 μg/ml.Conclusion: The developed RP-HPLC method is simple, accurate, rapid, sensitive, precise, and economic. Hence, this method can be employed successfully for the estimation of SER and DC sodium in both bulk and tablet dosage forms

    AN EXPERIMENTAL DESIGN APPROACH FOR OPTIMIZATION OF MODIFIED COLORIMETRIC FIRST-ORDER DERIVATIVE METHOD FOR ESTIMATION OF SERRALYSIN IN BULK AND PHARMACEUTICAL FORMULATION

    Get PDF
    Objective: The aim of the present work is to use experimental design to screen and optimize experimental variables for developing a colorimetric first-order derivative method for determining content of serralysin (SER) using biuret and Folin–Ciocalteu phenol reagent for stable color development. The method is based on the reaction of peptide bond in the protein with Biuret reagent in alkaline medium and further reaction of remaining tryptophan and tyrosine residues with Folin–Ciocalteu Phenol reagent to form a stable blue-colored complex (first-order derivative λmax 620 nm).Materials and Methods: A two-level full factorial design was utilized to screen the effect of Volume of NaOH (A), volume of biuret reagent (B), volume of Folin–Ciocalteu phenol reagent (C), and concentration of NaOH (D) on the formation of blue-colored SER - reagent complex (response - absorbance). A box Behnken experimental design with response surface methodology was then utilized to evaluate the main interaction and quadratic effects of these factors on the selected response.Results: With the help of a response surface plot and contour plot, the optimum values of the selected factors were determined and used for further experiments. These values were volume of NaOH (A) of 1.0 mL, volume of biuret reagent (B) of 0.25 mL, and volume of Folin–Ciocalteu phenol reagent (C) of 10 μL. The proposed method was validated according to the ICH Q2(R1) method validation guidelines. The developed colorimetric first-order derivative method was found to be simple, accurate, rapid, sensitive, precise, and economic. Further optimization of the method with experimental design approach makes it convenient for use in laboratory.Conclusion: The results of present study have clearly shown that an experimental design approach may be effectively applied to the optimization of a modified visible spectrophotometric method for estimation of SER in bulk and in pharmaceutical formulation with the least number of experimental runs possible. The method can be employed successfully for the estimation of SER in both bulk and tablet dosage form.Â
    • …
    corecore